Medical Decision Making — Latest Matching Preprints

1

Delay discounting and low-value care decision-making by primary care clinicians in a survey-based vignette experiment

Epling, J. W.; King, M. J.; Rockwell, M.; Tegge, A. N.; Hester, C. M.; Clay, T. L.; Callen, E. F.; Turner, J. K.; Stein, J.

2026-07-13 health systems and quality improvement 10.64898/2026.07.09.26357617 medRxiv

Top 0.1%

7.9%

Show abstract

Introduction: Primary care clinicians (PCC) commonly make decisions in the context of time delay and uncertainty. Delay discounting (DD) and probability discounting (PD) are cognitive biases related to delay and uncertainty that are minimally explored in PCC. We assessed DD and PD in PCC and evaluated their association with low-value care (LVC) decision-making. Methods: We administered a survey to PCC in a Southeastern U.S health system and within the American Academy of Family Physicians networks. The survey comprised standardized psychometric assessments of DD and PD and four LVC clinical vignettes. Outcomes included DD and PD discounting rates for two monetary rewards ($100 and $10,000) and ratings of LVC likelihood (0-100). We used regression analysis with model selection to evaluate the relationship between variables. Results: 225 PCC (89% physicians, 11% advanced practice providers) participated. Heterogeneity in DD and PD rates was observed. For the $10,000 reward, ln k(DD)= -6.80, IQR:-7.60--6.10) and ln h(PD)= 1.75, IQR:1.75-2.36). The reward amount impacted DD and PD in opposing directions (i.e., lower DD/higher PD rates for $10,000 vs. $100). LVC likelihood was highest for low-value antibiotics and lowest for low-value cervical cancer screening (median 20, IQR:10-40 and 0, IQR:0-10, respectively). Model selection revealed demographic associations with LVC likelihood, but no association with DD or PD. Conclusions: Consistent with effects previously reported in non-clinicians, PCC exhibited a range of DD and PD, which ranged by reward magnitude. Neither DD nor PD predicted vignette-based LVC likelihood. Further research should investigate actual clinical practice patterns and other LVC scenarios.

2

Using fragmented data to characterize community healthcare utilization

McCready, T.; Thorpe, L.; Roy, B.; Renson, A.

2026-07-15 health systems and quality improvement 10.64898/2026.07.13.26357976 medRxiv

Top 0.1%

4.8%

Show abstract

Community-level estimates of healthcare utilization are essential for identifying inequities, allocating resources, and evaluating place-based interventions. However, in the United States, no single data source adequately captures healthcare utilization within geographically defined populations. Population-based surveys often lack sufficient geographic resolution, insurance claims represent only covered populations, and electronic health records are limited to care delivered within participating health systems. Increasingly, researchers combine these fragmented data sources, yet limited guidance exists for conducting valid population-based descriptive analyses using incomplete and overlapping data. We review the strengths and limitations of major data sources used to characterize community healthcare utilization and propose an approach for conducting population-based descriptive analyses using fragmented data. Rather than focusing on the limitations of individual data sources, our approach begins by explicitly defining the target population and the ideal observational study that would answer the research question. Available data sources are then conceptualized as incomplete or imperfect realizations of that ideal, providing a structured approach to (a) identifying sources of selection bias, missingness, and measurement error, (b) articulating required assumptions, and (c) selecting appropriate analytic strategies. We illustrate our approach using colorectal cancer screening utilization among adults residing in Brooklyn, New York during 2022. By shifting attention from individual data sources to the target community and the assumptions required for valid inference, this approach provides a practical approach for strengthening descriptive analyses of community healthcare utilization and informing place-based public health research, policy, and practice.

3

Reinforcement Learning for Chronic Care Pathway Optimization: A Unified Framework across Three Clinical Goal Types

Wang, R.; Chen, H.; Wu, Y.; Li, Z.; Shen, R.; He, F.; Zhao, S.; Zheng, N.

2026-07-06 health informatics 10.64898/2026.07.03.26357209 medRxiv

Top 0.1%

3.3%

Show abstract

Objective: Chronic care requires sequential treatment under competing biomarker, safety, and cost constraints, yet clinical goal structures differ across diseases. We asked whether one physiology-informed reinforcement learning (RL) paradigm adapts to heterogeneous chronic-care goals without disease-specific policy architectures. Materials and Methods: We formalized a Type A/B/C clinical goal taxonomy (target cure, stable cruise, cycle completion) as a Physiology-Informed Markov Decision Process registry for gout, chronic kidney disease (CKD), and PCOS-mediated fertility treatment--each with PK/PD transitions, discrete actions, safety zones, and guideline doctor baselines. Unified BC->PPO training (GAE lambda=0.95) on 500 simulated trajectories per disease. Evaluation: paired seeds (N=50 primary; N=500 bootstrap 95% CIs), 10-seed robustness, ablation, literature sUA calibration, and out-of-distribution stress. McNemar/Wilcoxon with Benjamini-Hochberg FDR. Results: PCOS (Type C, primary): PPO 72.0% vs. doctor 54.0% at N=50 (+18 percentage points; FDR-significant); at N=500, PPO 69.8% [65.6, 73.8] vs. doctor 52.8% [48.8, 57.2]. Gout (Type A): PPO non-inferior--88.0% vs. 90.0% (McNemar p=1.0). CKD (Type B): doctor 32.0%, BC/PPO 38.0%. Offline CQL 92.0% on gout trajectories. PK recalibration RMSE 97.4 umol/L (r=0.809). Conclusions: Shared BC->PPO training generalizes across three goal types without cross-disease weight sharing. PCOS supports RL for bounded cycles; gout confirms guideline non-inferiority; CKD illustrates cruise-control difficulty. This framework offers a reproducible foundation for chronic pathway optimization pending prospective validation.

4

Decision support for preventing elective surgery cancellations: cost-sensitive risk ranking with cross-site validation in the NHS

Chizari, H.; Peter, N.; Lin, B.; Malekinezhad, F.; Pietroni, M.

2026-07-06 health systems and quality improvement 10.64898/2026.07.03.26357241 medRxiv

Top 0.1%

2.6%

Show abstract

Elective surgery late cancellations and ``did not attend'' (LCDNA) events waste theatre capacity, lengthen waiting lists, and impose avoidable costs on NHS Trusts. We present a decision-support approach that ranks upcoming elective procedures by expected cancellation cost and supports capacity-constrained outreach by selecting the highest-risk Top-K cases for intervention. Using cost-sensitive learning and a clinically grounded cost model, the policy reduces expected cost from approximately 103 GBP per case under business-as-usual to 77.08 GBP per case in a hospital-holdout (cross-site) evaluation designed to mimic deployment to a new hospital. In a complementary time-forward evaluation, representing prospective use within the same service environment, expected cost falls further to 70.97 GBP per case. The 6.11 GBP per-case difference between the two regimes highlights the added uncertainty introduced by cross-site operational shift and supports a conservative roll-out with local calibration and monitoring. Explainability analyses suggest that booking-to-procedure lead time, specialty or service line, calendar effects, and prior cancellation history are the strongest drivers of prediction, helping to inform tiered intervention workflows that prioritise near-term bookings and use model--pathway mismatches as an audit signal. Overall, the framework turns predictive performance into practical, capacity-aware policy guidance for reducing avoidable cancellations while supporting safe and equitable implementation.

5

Simulation of synthetic health records for assessment of causal inference methods for vaccine efficacy

Velasco Pardo, V.; Daines, L.; Katikireddi, S. V.; Ritchie, L.; Robertson, C.; Simpson, C. R.; McCowan, C.; Swallow, B.

2026-07-19 infectious diseases 10.64898/2026.07.17.26358308 medRxiv

Top 0.1%

2.4%

Show abstract

Background During the COVID-19 pandemic, public health agencies used near real-time observational data to answer questions regarding vaccine effectiveness. However, traditional observational methods do not allow conclusions regarding counterfactual scenarios to be drawn from clinical data. Counterfactuals, which are outcomes that would have occurred under alternative interventions, can be used to formally assess the causal effects of public health interventions on health outcomes while accounting for the effects of confounding. Ideally individual patient data is used for the development of counterfactuals. Low-fidelity synthetic data may be useful for advancing methodological development where governance and privacy constraints prohibit access to sensitive personal data. Methods We simulated synthetic datasets based on the EAVE-II COVID-19 platform which has been limited to use for surveillance purposes. EAVE-II includes almost all resident people in Scotland registered with qualified general medical practitioners. Patient characteristics were simulated to reflect the known distribution of the Scottish population, accounting for dependencies between variables. Each synthetic dataset was encoded to different realistic scenarios for EAVEII 'ground truth' vaccine rollout and effectiveness results, explicitly stating the causal and confounding mechanisms, using a statistically sound method based on marginal structural models. Synthetic datasets of 100,000 individuals were then generated across five confounding scenarios and five severe outcome types. Results In scenarios with weak confounding, both unweighted and inverse probability of treatment weighted (IPTW) logistic regression recovered the true causal parameters. As confounding strength increased, only weighted models recovered the true mechanism. Conclusions Low-fidelity synthetic datasets simulated from EAVE-II data analysts to build and test causal inference pipelines, develop novel analysis pipelines, and train new researchers while awaiting access to real data. We showed how to generate synthetic datasets from a marginal structural model under different confounding scenarios.

6

Proposed Context-of-Use Evaluation Framework for Medication Management Tasks Completed by Generative Artificial Intelligence

Henry, K.; Blotske, K.; Smith, B.; Li, T.; Gao, Y.; Zhao, X.; Liu, T.; Sikora, A.

2026-06-29 health systems and quality improvement 10.64898/2026.06.26.26356706 medRxiv

Top 0.1%

2.1%

Show abstract

Background: Standardized evaluation of agentic artificial intelligence (AI) for medication management is lacking. Given the potential lethality of medication errors endorsed or missed by AI, performance evaluation constructs are essential. The purpose of this evaluation was to develop a standardized grading framework for performance evaluation of medication management tasks. Methods: A mixed-methods approach was undertaken that included literature evaluation for standards and best practices of comprehensive medication management (CMM), panel discussions, and iterative application to set of cases. The goal was to develop a grading framework that effectively evaluated domains like safety, factuality, and clinical relevance that can be employed for a broad range of medication domains (i.e., electrolyte replacement, antibiotic selection). Inter-rater reliability with intraclass Krippendorffs Alpha was the primary outcome. Results: A total of 5 panelists developed the CMM Evaluation Framework, which includes 4 dimensions: safety, factuality, completeness, and preference. These dimensions are applied to three CMM skills: collecting patient data, analyzing information, and designing regimens. Each dimension is rated from 1-5. An additional dimension evaluated the presence of hallucinations and errors with high harm scores (i.e., absolute failure criteria regardless of an overall score). The Krippendorffs Alpha was highest in the medication therapy problem and medication therapy format categories, for 50 pneumonia cases, run in triplicate (150 total). Conclusions: This framework is informed by national standards for CMM and the healthcare professionals dedicated to the provision of this service. These domains allow for the possibilities of practice variation via the preference domain while also having strong guardrails against the commission of medication errors. Further analyses beyond pilot testing are necessary.

7

Leveraging Machine Learning Approaches to Identify Health-Related Social Needs Screening from Electronic Health Records

Dojcsak, L.; Abegaz, T.; Islam, M.; Chandler, Y.; Maleku, A.; Doubeni, A.; Mohammed, B.; Langston, M. A.; Donneyong, M. M.

2026-06-26 health informatics 10.64898/2026.06.23.26356305 medRxiv

Top 0.1%

2.1%

Show abstract

Health-related social needs (HRSNs), such as housing instability, food insecurity, and transportation challenges, are nonmedical factors associated with poorer health and well-being. Screening for unmet HRSNs is a critical step towards identifying at-risk patients, but manual screening is resource intensive and often incomplete. We utilized Electronic Health Records (EHR) data to develop machine learning models to identify unmet HRSNs using a limited set of non-modifiable sociodemographic features available in EHRs. We included 745,975 patients screened for at least one HRSN using data from community health centers that participated in the OCHIN practice-based research network between 2016 and 2022. Logistic regression, random forest (RF), eXtreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM) algorithms were trained to predict unmet HRSNs. Model performance was evaluated using 10-fold cross-validation and area under the receiver operating characteristic curve (AUROC). For overall HRSN prediction, LightGBM (AUROC, 64.5%, 95%CI: 64.3, 64.7) performed slightly better than logistic regression (61.4%), RF (63.7%), and XGBoost (60.3%). Similar performances were observed predicting individual HRSNs. Model performances were modest; however, they establish a benchmark for predictive performance achievable using only routinely available demographic data and provide a foundation for incorporating additional clinical and area-level social determinants of health data.

8

Calibrating machine learning approaches for probability estimation without calibration data

Di Carluccio, E.; Koliopanos, G.; Ojeda, F. M.; Weimar, C.; Ziegler, A.

2026-07-13 epidemiology 10.64898/2026.07.10.26357723 medRxiv

Top 0.1%

2.1%

Show abstract

Statistical prediction models for binary outcomes are becoming increasingly popular. One significant challenge is calibrating these models to suit the characteristics of a target population that is structurally different from the original population. Calibration is especially challenging when there is no training data available from the target population. To address this problem, we propose a novel calibration method, SimCal, which uses synthetic data generated from the model development data in conjunction with marginal statistics from the calibration cohort. We show that expert judgment modeling (EJM) may be used for calibration if cross-sectional data from the target population are available comprising expert judgments about the potential outcome and the covariates. We describe three alternative calibration approaches when calibration data are lacking: similarity-binning averaging (SBA), adaptive calibration of predictions (ACP), and Elkan calibration. In a simulation study, we compare SBA, ACP, Elkan calibration, and SimCal. R code for applying these methods is provided from the re-analysis of data on coronary artery disease. We illustrate all 5 calibration approaches with a real data set for predicting functional outcome after stroke and all approaches but EJM in the re-analysis of the Cleveland Clinic data. None of the approaches performed convincingly well in all situations. SimCal performed well when model parameters were correctly specified. EJM failed on the stroke data. Further research is urgently required for calibration in the absence of calibration data.

9

An Agent-Based Modeling Framework for Healthcare AI Adoption: Application to Ambient Clinical Documentation

Crowson, M. G.

2026-07-02 health informatics 10.64898/2026.07.01.26357077 medRxiv

Top 0.2%

1.9%

Show abstract

Objective: To develop and demonstrate an agent-based modeling framework for healthcare AI adoption, using ambient clinical documentation as the calibration case. Materials and Methods: We built an agent-based model with 50,000 clinician agents, 500 organization agents, and 4 vendor agents over 104 weeks. Modeled clinicians differed by psychotype, specialty, and friction/benefit thresholds; modeled organizations progressed through deployment phases with governance delays anchored to 8-28 weeks. All cited deployment values were independently re-verified against primary sources, and the model was validated against published data from seven health systems and three benchmarks using formal goodness-of-fit metrics (RMSE, 90% predictive-interval coverage) grouped by reference class. After correcting an organization-initialization artifact, we performed a formal six-parameter re-calibration to align both the early-time trajectory and the steady-state plateau with published data. Six intervention scenarios were compared in paired simulations (n=30 realizations per scenario) using effect sizes with bootstrap intervals, and both the full intervention comparison and the Sobol sensitivity screen were re-run natively under the re-calibrated model. Global sensitivity analysis used Sobol indices (64 base samples; 1,152 parameter sets) across eight parameters. Results: Baseline simulations produced S-curve adoption trajectories with wide variability. The re-calibrated model reproduced both early-time single-site trajectories and the cross-sectional adoption plateau, covering 86% of reference-class-matched anchors at the nominal 90% level, versus 29% for the original configuration. Most intervention scenarios increased adoption; in the original configuration the combined intervention outperformed individual levers, with significant interactions confirmed by 23 factorial analysis. Re-running the analyses natively under the calibrated model both confirmed and revised these conclusions: governance remained the largest single structural lever and non-success absorbing states remained prominent, but intervention effects attenuated sharply, the combined intervention no longer reliably exceeded the best single lever at operating scale, and the leading sensitivity driver shifted from governance delay to clinician friction/edit-rate tolerance. That calibration changes which levers appear influential is itself the central methodological finding. Organizational outcomes clustered into non-success absorbing states (pilot stagnation and failure) alongside success and scaling. Conclusions: Governance delay is an explicit upstream gate in the model, so its influence reflects model architecture and should not be interpreted as a universal real-world priority. The modeled pilot stagnation state is hypothesis-generating rather than an empirical category. Agent-based modeling provides a structured framework for understanding healthcare AI adoption dynamics. The approach supports hypothesis generation and comparative scenario exploration rather than point prediction.

10

Rationale and guidance for implementing the continual reassessment method for dose-finding in controlled human infection model studies

Weerasinghe, C.; Osowicki, J.; Simpson, J. A.; Crocker-Buque, T.; McCarthy, J.; Williams, E.; Price, D. J.

2026-07-17 infectious diseases 10.64898/2026.07.16.26358128 medRxiv

Top 0.2%

1.7%

Show abstract

Controlled human infection models (CHIMs) are increasingly used in infectious disease research to study pathogen dynamics and evaluate interventions under controlled conditions. However, these studies are resource-intensive and involve ethical and safety constraints, making efficient study design critical. Dose-finding is a key early component in CHIMs, where the aim is to identify a challenge dose that achieves a target infection probability. Traditional rule-based designs are commonly used but can be inefficient, motivating the use of model-based adaptive approaches such as the Bayesian Continual Reassessment Method (CRM). Although CRM has been extensively studied and widely adopted in Phase I oncology trials for identifying the maximum tolerated dose of therapeutics, its application in CHIM settings remains limited, particularly when the endpoint of interest is infection. This tutorial provides step-by-step guidance for implementing a Bayesian CRM in dose-finding CHIMs, using an oropharyngeal Neisseria gonorrhoeae challenge as a motivating case study. The framework outlines key design components, including dose-grid specification, dose-response model, prior elicitation, Bayesian updating, decision rules, and stopping criteria, with particular emphasis on a clinically interpretable parameterisation. Trial operating characteristics are evaluated through simulation studies under multiple dose-response scenarios and prior-predictive analyses, and compared with a commonly used '3+3' type rule-based design. This work highlights the advantages of Bayesian model-based designs for dose-finding in CHIMs over classic rule-based designs and provides a structured, reproducible framework for implementing CRM, supporting their application in future CHIM studies.

11

Estimating vaccine-prevented disease outcomes when vaccination has only direct effects

Yang, F.; Magee, A.; Morris, S. E.; Mathis, S. M.; Wiegand, R.; Iuliano, D. A.; Biggerstaff, M.; Olesen, S. W.

2026-06-23 epidemiology 10.64898/2026.06.20.26356134 medRxiv

Top 0.2%

1.5%

Show abstract

Vaccination can be a useful intervention for reducing infectious disease burden. Estimating numbers of vaccine-prevented health outcomes is one approach to quantifying the benefits of vaccination. Here we improve a method described by Foppa et al. (1) that assumes vaccination has only direct effects, that is, it cannot prevent infection or onward transmission of the disease. We rederive this method and derive an improved method that increases estimation accuracy with minimal additional analytical complexity. To evaluate the improved method, we simulated disease outbreaks and compared the accuracy of the two methods for estimating prevented disease outcomes. In 84% of simulations performed over a wide parameter space, the improved method had an equal or smaller estimation error compared to the original Foppa method, with 7.9-fold smaller mean error and 44-fold smaller standard deviation of errors. Our study improves a method for estimating prevented burden when assuming vaccination has only direct effects.

12

Two-Sample Instrumental Variables under Population Mismatch: A Transportability Framework with Bias Diagnostics

Qian, Y.; Song, Y.

2026-06-26 epidemiology 10.64898/2026.06.15.26355602 medRxiv

Top 0.2%

1.3%

Show abstract

Instrumental variable (IV) methods are widely used in health and social sciences to estimate causal treatment effects among compliers. In certain research settings, the instrument-treatment association (first stage) and the instrument-outcome association (reduced form) are each estimated from a different dataset. Two-Sample Instrumental Variables (TSIV), proposed by Angrist and Krueger (1992), addresses this by combining first-stage and reduced-form estimates from separate data sources into a single causal effect estimate. However, TSIV identification requires that instrument compliance behavior be consistent across the two samples, a condition that is rarely verified in practice. We show mathematically and empirically that when compliance differs between samples, the raw TSIV estimator does not converge to the true Local Average Treatment Effect (LATE) and instead attenuates toward a predictably biased limit proportional to the ratio of first-stage compliance rates between the two samples. To address this, we formalize a framework for estimating LATE with TSIV under two key assumptions: (1) Covariate Overlap, requiring that the two samples share sufficient common support in their covariate distributions, and (2) Compliance Transportability, requiring that compliance behavior is identical across populations after conditioning on observed covariates. We consider a setting in which a health policy instrument and outcomes are recorded in administrative claims while treatment and covariates are collected in a survey. We use a C-statistic derived from pooled covariates to detect population mismatch and an Inverse Probability Weighting (IPW) correction that reweights the first-stage sample to approximate the administrative covariate distribution. In Monte Carlo simulations across eight scenarios calibrated to a survey-Medicaid setting, IPW-TSIV reduces bias in estimating the LATE, achieving 88% reduction in the primary scenario, 82% under severe selection, and 79% when state-level expansion policy drives compliance heterogeneity. We further validate this framework using the Oregon Health Insurance Experiment, where partitioning the public-use lottery data (N = 24,646) into two non-overlapping samples with substantively meaningful compliance heterogeneity yields a verifiable benchmark against the true causal effect. IPW-TSIV reduces mean absolute bias by 71.6% relative to the oracle S2-specific LATE across 10 independent replications (C-statistic = 0.78), outperforms naive TSIV in all 10 splits, and reduces mean bias relative to the full-data LATE from +0.016 to +0.008. This framework provides applied researchers with actionable diagnostic thresholds to detect sample mismatch, validate transportability assumptions, and determine when structural TSIV estimation is reliable.

13

Operating characteristics of analysis methods for clinical trials in viral respiratory disease: A simulation study protocol

Schwenke, J. M.; Herkner, F.; Kayembe, M. T.; Olsen, I. C.; Briel, M.; König, F.

2026-06-29 infectious diseases 10.64898/2026.06.24.26356481 medRxiv

Top 0.2%

1.2%

Show abstract

Acute viral respiratory infections (ARVIs) are a major cause of hospitalization and death worldwide, yet randomized clinical trials in this setting face substantial challenges in selecting efficient and clinically meaningful primary endpoints. Mortality is often too infrequent to serve as a feasible primary endpoint. Several alternative approaches have been proposed, including ordinal scales, time-to-event endpoints, recovery-based composite outcomes, and longitudinal ordinal models. However, their comparative operating characteristics under realistic ARVI disease courses remain insufficiently understood. We describe a simulation study to compare the type I error and power of commonly used and recently proposed endpoints and analysis strategies for two-arm randomized trials in hospitalized participants with ARVIs. Data will be generated under several mechanisms designed to mimic plausible participant trajectories, including a latent Brownian motion process, a first-order ordinal Markov process, a latent recurrent-event process with frailty, and resampling from individual participant data from the ACTT-2 trial. Simulated outcomes will use 4-, 6-, and 8-level ordinal severity scales and will reflect moderately and severely ill populations, follow-up horizons of 28 or 60 days, varying treatment effects, and sample sizes. Methods to be compared include Markov ordinal state transition models, proportional-odds models at a fixed time point, days-to-recovery scale analyses, Cox models for time-to-event endpoints, logistic regression for binary endpoints, generalized pairwise comparisons for hierarchical composites, and t-tests for days alive and out of hospital. This study will provide a systematic comparison of endpoint definitions and analysis methods for ARVI trials under clinically motivated data-generating mechanisms. The results are intended to inform the selection of feasible, interpretable, and statistically efficient primary analysis strategies for future trials in viral respiratory disease.

14

Multilevel Factors Associated with Nonresponse to Patient-Reported Outcome Measures in Routine Radiation Oncology Care

Liu, J. B.; Chen, Y.-J.; Edelen, M. O.; Pusic, A. L.; Martin, N. E.; Zeng, C.

2026-07-17 health systems and quality improvement 10.64898/2026.07.15.26358162 medRxiv

Top 0.3%

1.1%

Show abstract

Purpose: Nonresponse to routinely collected patient-reported outcome measures (PROMs) threatens the representativeness of aggregated data. We characterized patient-, provider-, and clinic-level factors associated with PROMIS Global-10 nonresponse in routine radiation oncology care. Methods: In this retrospective cohort study, all adults seen at five Mass General Brigham radiation oncology clinics over one year were included. The primary outcome was patient-level nonresponse, defined as never completing the portal-administered Global-10 versus completing it at least once. Using iterative mixed-effects logistic regression, we modeled patient-, provider-, and clinic-level factors. Results: Among 12,214 patients, 71 providers, and five clinics, patient- and appointment-level response rates were 35.4% and 10.9%, with patient-level response ranging nearly fivefold across clinics (12.8% to 66.2%). In Model 1, male sex, lower education, not working, and recent surgery had higher odds of nonresponse, and longer time since diagnosis lower odds. After provider- and clinic-level factors were added, patient sex, education, and employment became nonsignificant, whereas recent surgery (adjusted odds ratio [aOR] 1.97) and longer time since diagnosis (aOR 0.46 for >12 months) persisted. A provider's historical collection rate was protective but attenuated at the clinic level. There, a later program launch (aOR 0.29) and higher historical collection rate (aOR 0.79) correlated with lower nonresponse, whereas academic versus community setting did not. Conclusions: Nonresponse to routinely collected PROMs is a multilevel phenomenon driven substantially by clinic-level implementation factors, not patient characteristics alone. Because response rate is only a proxy for representativeness, PROMs programs and PRO-based performance measures should prioritize representative collection over volume.

15

Modeling effect of hypertension control on death, incidence of atrial fibrillation and economic impact to Medicare and hospitals.

Williams, J.; Mencer, N.; Mak, W. Y.; Dalle Luche, G.; Dundovic, S.

2026-07-17 health systems and quality improvement 10.64898/2026.07.15.26358198 medRxiv

Top 0.3%

1.1%

Show abstract

Background Hypertension is a major modifiable risk factor for atrial fibrillation (AF), yet blood pressure (BP) control remains suboptimal in older U.S. adults. Objectives This study evaluated how improve systolic BP (SBP) control could affect incident AF, downstream AF ablation demand, Medicare savings, and hospital revenue. Methods A population-based modelling framework was developed to estimate mortality and incident AF hazards across SBP strata: <120, 120-139, 140-159, and ?160 mm/Hg. AF incidence in the SBP <120 mmHg group was set at 2.2 per 1,000 person-year, with hazard ratios of 1.17, 1.42 and 1.64 applied to higher SBP strata. We assumed 25% of incident AF patients would undergo ablation, with a 7.2% complication rate. AF prevalence was projected to increase by 4.6% annually over 10 years. Medicare savings and hospital revenue foregone were estimated under varying procedure cost and contribution-margin assumptions. Results Higher SBP was associated with greater hazards of death and incident AF. Improved SBP control reduced projected AF incidence and ablation demand. Over 10 years, cumulative Medicare savings were projected at $8.7B-$10.9B across the full modelled population. However, reduced ablation volume translated into hospital revenue foregone, ranging from $75M to $377M in the first year, and approximately $1.03B-$5.2B cumulatively over 10 years. Conclusions Improved SBP control may reduce AF incidence, prevent avoidable invasive ablation procedures, relieve pressure on surgical waitlists, and generate substantial Medicare savings. However, these benefits may reduce hospital procedural revenue, highlighting a misalignment between prevention-oriented care and fee-for-service reimbursement incentives.

16

Development and Internal Validation of a County-Level Screening Index for Postpartum Medicaid Access Barriers

Howard, C.; Shekhar, P.

2026-07-07 health policy 10.64898/2026.07.05.26357332 medRxiv

Top 0.3%

1.1%

Show abstract

Background: Postpartum Medicaid coverage and support are central maternal health policy issues, but county-level tools for identifying where postpartum Medicaid populations may face overlapping administrative, clinical, and contextual access barriers remain limited. Methods: We developed and internally validated a county-level Postpartum Medicaid Access Barrier Index for all 3,144 counties and county equivalents in the 50 states and District of Columbia. Public data sources included geocoded Medicaid office locations from Shafer et al. (2024), U.S. Census county boundaries, American Community Survey 2024 5-year county indicators, the National Center for Health Statistics 2023 Urban-Rural Classification Scheme for Counties, and county-level hospital-based obstetric care status from the University of Minnesota Rural Health Research Center. Medicaid office locations were spatially assigned to counties, then merged with ACS indicators, rurality, and obstetric care status by county FIPS. The theoretical score range was 0-11; the index assigned higher weights to two core infrastructure measures and lower weights to contextual indicators. Internal validation assessed component structure, known-groups validity, geographic clustering, weighting sensitivity, added value over simpler infrastructure screens, and separation across concern levels. Results: Across 3,144 counties, observed scores ranged from 0 to 10 on the theoretical 0-11 score, with a mean of 3.65 and median of 3. High or highest concern counties accounted for 665 counties (21.2%), including 56 counties (1.8%) in the highest concern group. Component correlations were low-to-moderate, with an average absolute phi of 0.176 and no pairwise component correlation at or above 0.50. Known-groups validity was strong: dual administrative and clinical gap counties scored 4.43 points higher than counties with neither gap (Cohen's d = 3.28, p < 0.001). Scores were geographically clustered (Moran's I = 0.375, permutation p = 0.005). A dual-gap-only screen captured 386 of 665 high/highest concern counties (58.0%) but missed 279 high/highest counties; a parsimonious rule requiring one infrastructure gap plus at least four contextual flags recovered 265 of these 279 missed counties (95.0%) with 100.0% precision. Discussion: The Postpartum Medicaid Access Barrier Index provides a transparent county-level screening tool for identifying places where administrative, clinical, and contextual barriers may overlap for postpartum Medicaid populations and should be externally validated against Medicaid enrollment, renewal, churn, coverage continuity, and postpartum care outcomes.

17

PORTRAIT: a calibrated patient Passport with built-in refusal - describing individuals against a reference population

Oehring, D.

2026-07-15 health informatics 10.64898/2026.07.13.26357968 medRxiv

Top 0.3%

1.0%

Show abstract

Background Averagebased summaries serve individual patients poorly PORTRAIT is a calibrated abstentionaware tool that describes where one patient sits relative to a reference population across 12 cardiometabolic markers how confident that placement is and which features drive it PORTRAIT describes it does not diagnose or predict Abstention is a designed feature given the known limits of conditional coverage Methods Conformal calibration was combined with distributionfree coverage bounds quantileregression coordinates and copulabased joint structure A frozen reference cohort n9421 supplied fixed calibration a heldout cohort n2247 tested transportability across six strata A release gate required the minimum perslice coverage to hold across 4 of 5 seeds Coverage was retested under survey weighting to the US adult population Coherence was reported as a descriptive joint coordinate Discrimination was summarised with Harrells C and multiplicity controlled by BHFDR Interface conformance was assessed against defined requirements Nielsen heuristics and WCAG 22 AA with attention to automation bias and riskgraph design Results The frozen reference held all six strata within band 08640903 at abstention 0113 whereas a resplit undercovered to 071 at abstention 0227 coverage survived survey weighting The release gate passed on 4 of 5 seeds at abstention 0101 against a nominal 090 and in the frozenreference configuration that ships all six strata held inside the calibrated band 08640903 Coherence showed orthogonality 0444 to raw extremity and correlated 0892 with a copulaMahalanobis distance while remaining deliberately nonidentical so it adds perfeature information Two transfer tests returned negatives the ocular transfer did not hold coverage at thinn Adding coherence changed mortality discrimination by deltaC 00047 Interface requirements moved from 142718 to 38147 METPARTIALUNMET Nielsen severity resolved 7 of 10 issues WCAG 22 AA text criteria passed Conclusions PORTRAIT situates a patient against a frozen reference holds coverage under survey weighting to the US adult population and abstains when calibration cannot be supported The headline result is that the frozen reference held coverage where a resplit did not

18

Selective prediction as a triage gate for primary-care depression screening: quantifying and mitigating selection bias in CHARLS-2011

Wang, Z.; liu, y.

2026-07-20 health informatics 10.64898/2026.07.17.26357845 medRxiv

Top 0.3%

1.0%

Show abstract

Background Primary care in China lacks structured mental-health assessment, and the machine-learning models that could support such screening are typically developed on heavily selected samples. Cumulative inclusion and exclusion criteria, though usually treated as neutral data-cleaning steps, can create heterogeneity in predictive reliability among retained participants. Using the China Health and Retirement Longitudinal Study (CHARLS) 2011 baseline, we quantified how selection funnels distort epidemiological associations and inflate machine-learning metrics, and tested selective prediction as mitigation. Methods Using the CHARLS 2011 baseline with temporal external validation in CHARLS-2018, we built a four-level selection funnel (L0-L3), evaluated five classifiers with nested cross-validation and SMOTE, and compared model-embedded uncertainty with a decoupled predictor-selector framework; XGBoost cross-validation residuals drove risk stratification and classification and regression tree (CART) rules. Results Sample sizes fell from L0 n=17,705 to L3 n=4,256 (24.0%). The cancer-depression odds ratio attenuated from 1.78 (95% CI 1.32-2.41) to 1.39 (0.74-2.63), losing significance. AUC rose with selection but not after multiple-comparison correction, whereas calibration error increased for four of five models. Model-embedded uncertainty succeeded only for XGBoost; with the decoupled XGBoost residual selector, all five models achieved selective prediction at approximately 20% coverage (test AUC 0.90, 95% CI 0.85-0.95), abstaining on approximately 80% of cases for individual safety. Risk stratification was stable (residual Spearman correlations >0.95; multi-seed Jaccard 0.88), and CART rules used self-rated health, education, pain, and marital status. Conclusions The findings support a deployable primary-care triage pathway: a four-variable rule identifies patients suitable for algorithm-assisted scoring (approximately 20% coverage) and routes the remainder to human evaluation. Methodologically, cumulative selection bias produces a dual distortion: epidemiological associations are compressed and machine-learning metrics inflated. Selective prediction is limited mainly by uncertainty-indicator design. Performance metrics should be reported with selection level, coverage, and calibration trajectory. Decoupled selective prediction with CART rule extraction provides an actionable framework for quality-controlled, tiered-care deployment. Keywords: selective prediction, selection bias, CHARLS, depression, predictor-selector decoupling, uncertainty quantification, classification and regression tree, triage, clinical decision support, health management.

19

its2s: a Python package for two-stage interrupted time series analysis using machine learning

Wilner, L.; Casey, J. A.; Mooney, S. J.; Do, V.; Ma, Y.; Benmarhnia, T.; Dey, A. K.

2026-07-06 epidemiology 10.64898/2026.07.02.26357175 medRxiv

Top 0.3%

1.0%

Show abstract

When randomized controlled trials are infeasible, researchers may leverage natural experiments for causal inference. Interrupted time-series (ITS) designs compare observed post-event trends to counterfactual predictions from pre-event data. Two-stage ITS designs use flexible models to generate optimized counterfactual predictions in the first stage, then estimate intervention effects by comparing observed to predicted outcomes in the second stage. Fitting high-dimensional versions of these models is challenging, requiring systematic infrastructure to ensure rigor and reproducibility. In response, we developed its2s, an open-source Python package implementing the two-stage ITS design with machine learning. its2s allows users to specify an intervention date and training/testing periods, select among built-in model architectures (e.g., Prophet-XGBoost, NeuralProphet), and generate confidence intervals via moving block bootstrap, preserving temporal autocorrelation in residuals. its2s layers defaults, configuration files, and runtime overrides to support workflows ranging from rapid default implementations to highly tailored analyses. We validated its2s using two case studies: a simulation with a 12% policy effect, recovering the true effect as 11.77%, and an analysis of the 2021 Pacific Northwest heat dome, finding 53% excess injury mortality over the following three weeks. its2s provides a flexible, reproducible framework for ITS-based quasi-experimental research, lowering barriers to rigorous machine learning-based counterfactual modeling.

20

Mammography Access, Urbanicity, and Late-Stage Breast Cancer Burden Across Texas: A Bayesian Spatial Analysis

Gao, J.; Zhang, Y.; Tian, J.; Ferguson, G. M.; Griffin, B. C.; Windett, J. H.

2026-07-15 health policy 10.64898/2026.07.11.26357817 medRxiv

Top 0.3%

1.0%

Show abstract

Background Geographic disparities in access to preventive healthcare services remain an important contributor to breast cancer inequities in the United States. Mammography screening plays a critical role in early detection and improved survival; however, screening infrastructure and healthcare accessibility remain unevenly distributed across many regions, particularly within large and socioeconomically diverse states such as Texas. Understanding the spatial relationships among mammography access, urbanicity, socioeconomic vulnerability, and late-stage breast cancer burden is important for developing geographically targeted public health interventions. Methods This study integrated multiple county-level datasets, including mammography facility locations from the Texas Cancer Information database, late-stage breast cancer burden data from the National Cancer Institute, and socioeconomic indicators from census-derived datasets and the Centers for Disease Control and Prevention Social Vulnerability Index. Geographic information systems (GIS), spatial autocorrelation analyses, and Bayesian spatial epidemiologic methods were used to evaluate geographic patterns across Texas counties from 2018 to 2022. Global and local Moran's I statistics were calculated to assess spatial clustering patterns. Bayesian spatial Poisson conditional autoregressive (CAR) regression models were subsequently estimated to examine associations between mammography center density, population density, female socioeconomic characteristics, and late-stage breast cancer burden while accounting for residual spatial dependence. Results Significant positive spatial autocorrelation was observed for county-level late-stage breast cancer burden across Texas counties. Mammography facilities were heavily concentrated within major metropolitan regions, while many rural counties demonstrated comparatively limited screening infrastructure availability. Bayesian spatial regression analyses demonstrated that log-transformed population density was significantly inversely associated with the burden of late-stage breast cancer ({beta} = -0.136, 95% CrI [-0.175, -0.103]), indicating that less densely populated counties experienced greater burden than more urbanized counties. Mammography center density showed a borderline inverse association with late-stage burden ({beta} = -0.008, 95% CrI [-0.017, 0.002]), suggesting that greater availability of screening infrastructure may contribute to reduced burden. Persistent residual spatial dependence remained across counties ({rho} = 0.472, 95% CrI [0.038, 0.919]), indicating ongoing geographic clustering beyond measured explanatory variables. Conclusions Substantial geographic disparities in late-stage breast cancer burden, mammography access, and socioeconomic vulnerability exist across Texas counties. The findings suggest that urbanicity and screening infrastructure availability play important roles in shaping geographic inequities in breast cancer outcomes. Public health interventions should move beyond increasing facility availability alone and instead incorporate geographically targeted strategies that address rural healthcare access limitations, healthcare infrastructure disparities, and broader structural barriers to preventive screening services.